ggplot2ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. The way you make plots in ggplot2 is very different from base graphics making the learning curve steep. That said, it’s totally worth it.
#Within each document, it is important to call the ggplot2 package so it knows you will be using functions/data/etc from inside that package
library(ggplot2)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ✔ purrr 0.3.3
## ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
It’s essential that you properly organize your data into a data frame before you start with ggplot2. This why we spend the last week or two focus on learning ways to transform and wrangle data into different formats.
Once you have your data ready to go then you gradually add bits and pieces to it to create a plot. Plots are built up in layers, with the typically ordering being
We will be working with the dataset mpg.
data(mpg)
# ggplot ( dataframe, aes(x=xvariable, y=yvariable))
ggplot(mpg, aes(cty, hwy))
A blank ggplot is drawn. Even though the x and y are specified, there are no points or lines in it. This is because, ggplot doesn’t assume that you meant a scatterplot or a line chart to be drawn. I have only told ggplot what dataset to use and what columns should be used for X and Y axis. I haven’t explicitly asked it to draw any points.
The basics:
ggplot(mpg, aes(cty, hwy)) +
geom_point()
To customize colors, plotting characters, size:
ggplot(mpg, aes(cty, hwy)) +
geom_point(col="steelblue", pch=1, size=2)
A list of possible pch values
Let’s make a scatterplot on top of the blank ggplot by adding points using a geom layer called geom_point.
ggplot(mpg, aes(cty, hwy)) +
geom_point(col="steelblue", size=2) +
labs(title="City MPG vs. Highway MPG",
subtitle="this is a subtitle",
x="City MPG",
y="Highway MPG",
caption="source: mpg dataset")
gg <- ggplot(mpg, aes(cty, hwy)) +
geom_point(aes(col=class), size=2) +
labs(title="City MPG vs. Highway MPG",
subtitle="this is a subtitle",
x="City MPG",
y="Highway MPG",
caption="source: mpg dataset")
gg
As an added benefit, the legend is added automatically. If needed, it can be removed by setting the legend.position to None from within a theme() function.
gg + theme(legend.position="None")
Also, You can change the color palette entirely.
gg + scale_colour_brewer(palette="Spectral")
More of such palettes can be found in the RColorBrewer package
RColorBrewer palettes
You can also build your own color palettes using the built in colors in R or by using HEX codes (ie. #RRGGBB )
R Built In Colors
We will spend more time later in the course discussing best practices for color choices, but for now keep in mind:
ggplot(mpg, aes(cty, hwy, label=model)) +
geom_point(aes(col=class), size=2) +
labs(title="City MPG vs. Highway MPG",
subtitle="this is a subtitle",
x="City MPG",
y="Highway MPG",
caption="source: mpg dataset") +
geom_text(size=2)
Themes can be a useful way to “style” an entire graph at once. Common themes are theme_classic(), theme_dark(), theme_bw(), and theme_grey().
gg + theme_grey()
library(ggthemes) contains lots of additional themes including theme_wsj() (Wall Street Journal), theme_economist() (The Economist), theme_fivethirtyeight() (Five Thirty Eight), etc.
#make sure you have run install.packages("ggthemes") on your computer at some point
library(ggthemes)
gg + theme_fivethirtyeight()
Histograms should be used for one continuous variable.
#hist(mpg$cty)
ggplot(mpg, aes(cty)) +
geom_histogram(binwidth=2)
Boxplots should be used for one continuous variable. Side-by-Side Boxplots can be good for comparing a numerical variable across many different levels (categories).
mpg %>%
mutate(class = reorder(class, cty, FUN=median)) %>%
ggplot(aes(x=class, y=cty)) +
geom_boxplot(fill="steelblue", outlier.size = 0)
Barplots should be used for one or two categorical variables.
mpg %>%
mutate(unit = 1) %>%
mutate(manufacturer = reorder(manufacturer, unit, FUN=sum)) %>%
ggplot(aes(x=manufacturer)) +
geom_bar() +
labs(title="Barplot of One Categorical Variable", subtitle="Manufacturers of Cars") +
theme(axis.text.x=element_text(angle=90))
#coord_flip()
mpg %>%
mutate(unit = 1) %>%
mutate(manufacturer = reorder(manufacturer, unit, FUN=sum)) %>%
ggplot(aes(x=manufacturer)) +
geom_bar() +
geom_text(stat='count', aes(label=..count..), vjust=-1)
labs(title="Barplot of One Categorical Variable", subtitle="Manufacturers of Cars") +
theme(axis.text.x=element_text(angle=90))
## NULL
mpg %>%
ggplot(aes(manufacturer)) +
geom_bar(aes(fill=class)) +
coord_flip() +
scale_fill_brewer(palette="Spectral")
The are so many different ways to modify the themes - the legend, where the axis ticks go, the background colors, the position of text, the font, etc. You can get a the full scope of all the options by typing ?theme into the console. scale_color_brewer() is for points, lines, etc. scale_fill_brewer() is for barplots, boxplots
scale_color_manual() is for points, lines, etc. scale_fill_manual() is for barplots, boxplots
gapminder <- read.csv("https://ebmwhite.github.io/MATH0216/activities/gapminder.csv")
gapminder %>%
ggplot(aes(x=year, y=lifeExp, group=country)) +
geom_line()
gapminder %>%
group_by(continent, year) %>%
summarize(lifeExp = mean(lifeExp)) %>%
ggplot(aes(x=year, y=lifeExp,color=continent)) +
geom_line()
Here are some resources that may be useful quick reference guides for ggplot2: